A Workload-Adaptive Streaming Partitioner for Distributed Graph Stores
نویسندگان
چکیده
Abstract Streaming graph partitioning methods have recently gained attention due to their ability scale very large graphs with limited resources. However, many such do not consider workload and characteristics. This may degrade the performance of queries by increasing inter-node communication computational load imbalance. Moreover, existing workload-aware cannot consistently provide good as they dynamic workloads that keep emerging in applications. We address these issues proposing a novel workload-adaptive streaming partitioner named WASP, aims achieve low-latency high-throughput online queries. As each typically contains frequent query patterns, WASP exploits capture active vertices edges which are frequently visited traversed, respectively. information is used heuristically improve quality partitions either avoiding concentration few proportional visit frequencies or reducing probability cut traversal frequencies. In order assess impact on store show how easily approach can be plugged top system, we exploit it distributed graph-based RDF store. Our experiments over three synthetic real-world datasets corresponding static achieves better against state-of-the-art partitioners, especially workloads.
منابع مشابه
A Scalable Distributed Graph Partitioner
We present Scalable Host-tree Embeddings for Efficient Partitioning (Sheep), a distributed graph partitioning algorithm capable of handling graphs that far exceed main memory. Sheep produces high quality edge partitions an order of magnitude faster than both state of the art offline (e.g., METIS) and streaming partitioners (e.g., Fennel). Sheep’s partitions are independent of the input graph di...
متن کاملWorkload-aware Streaming Graph Partitioning
Partitioning large graphs, in order to balance storage and processing costs across multiple physical machines, is becoming increasingly necessary as the typical scale of graph data continues to increase. A partitioning, however, may introduce query processing latency due to inter-partition communication overhead, especially if the query workload exhibits skew, frequently traversing a limited su...
متن کاملGraSP: Distributed Streaming Graph Partitioning
This paper presents a distributed, streaming graph partitioner, Graph Streaming Partitioner (GraSP), which makes partition decisions as each vertex is read from memory, simulating an online algorithm that must process nodes as they arrive. GraSP is a lightweight high-performance computing (HPC) library implemented in MPI, designed to be easily substituted for existing HPC partitioners such as P...
متن کاملSocial Hash Partitioner: A Scalable Distributed Hypergraph Partitioner
We design and implement a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k − 1)-cut metric, by optimizing a novel objective called probabilistic fanout. This choice allows a simple local search heuristic to achieve comparable solution quality to the best existing hypergraph par...
متن کاملAdaptive Algorithms for Managing a Distributed Data Processing Workload
Workload management, a function of the OSf390" operating system base control program, allows installations to define business objectives for a clustered environment (Parallel SysplexTM in OSl390). This business policy is expressed in terms that relate to business goals and importance, rather than the internal controls used by the operating system. OSf390 ensures that system resources are assign...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data Science and Engineering
سال: 2021
ISSN: ['2364-1541', '2364-1185']
DOI: https://doi.org/10.1007/s41019-021-00156-2